action effect
CollapsingBanditsandTheirApplicationtoPublic HealthInterventions
Neither (i) nor (ii) are known for general RMABs. Therefore, to capture the scheduling problems addressed inthiswork,weintroduce anewsubclass ofRMABs,Collapsing Bandits, distinguished by the following feature: when an arm is played, the agent fully observes its state, "collapsing" any uncertainty, but when an arm is passive, no observation is made and uncertainty evolves.
Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds
Ding, Yan, Zhang, Xiaohan, Amiri, Saeid, Cao, Nieqing, Yang, Hao, Kaminski, Andy, Esselink, Chad, Zhang, Shiqi
Task planning systems have been developed to help robots use human knowledge (about actions) to complete long-horizon tasks. Most of them have been developed for "closed worlds" while assuming the robot is provided with complete world knowledge. However, the real world is generally open, and the robots frequently encounter unforeseen situations that can potentially break the planner's completeness. Could we leverage the recent advances on pre-trained Large Language Models (LLMs) to enable classical planning systems to deal with novel situations? This paper introduces a novel framework, called COWP, for open-world task planning and situation handling. COWP dynamically augments the robot's action knowledge, including the preconditions and effects of actions, with task-oriented commonsense knowledge. COWP embraces the openness from LLMs, and is grounded to specific domains via action knowledge. For systematic evaluations, we collected a dataset that includes 1,085 execution-time situations. Each situation corresponds to a state instance wherein a robot is potentially unable to complete a task using a solution that normally works. Experimental results show that our approach outperforms competitive baselines from the literature in the success rate of service tasks. Additionally, we have demonstrated COWP using a mobile manipulator. Supplementary materials are available at: https://cowplanning.github.io/
On the Challenges of using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action Effects
Basu, Sumana, Legault, Marc-André, Romero-Soriano, Adriana, Precup, Doina
Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process), a subclass of POMDPs in which the Markov assumption does not hold specifically due to prolonged effects of actions. Motivated by the pharmacology literature, we propose a simple and effective approach to converting drug dosing PAE-POMDPs into MDPs, enabling the use of the existing RL algorithms to solve such problems. We validate the proposed approach on a toy task, and a challenging glucose control task, for which we devise a clinically-inspired reward function. Our results demonstrate that: (1) the proposed method to restore the Markov assumption leads to significant improvements over a vanilla baseline; (2) the approach is competitive with recurrent policies which may inherently capture the prolonged effect of actions; (3) it is remarkably more time and memory efficient than the recurrent baseline and hence more suitable for real-time dosing control systems; and (4) it exhibits favorable qualitative behavior in our policy analysis.
Robot Task Planning and Situation Handling in Open Worlds
Ding, Yan, Zhang, Xiaohan, Amiri, Saeid, Cao, Nieqing, Yang, Hao, Esselink, Chad, Zhang, Shiqi
Automated task planning algorithms have been developed to help robots complete complex tasks that require multiple actions. Most of those algorithms have been developed for "closed worlds" assuming complete world knowledge is provided. However, the real world is generally open, and the robots frequently encounter unforeseen situations that can potentially break the planner's completeness. This paper introduces a novel algorithm (COWP) for open-world task planning and situation handling that dynamically augments the robot's action knowledge with task-oriented common sense. In particular, common sense is extracted from Large Language Models based on the current task at hand and robot skills. For systematic evaluations, we collected a dataset that includes 561 execution-time situations in a dining domain, where each situation corresponds to a state instance of a robot being potentially unable to complete a task using a solution that normally works. Experimental results show that our approach significantly outperforms competitive baselines from the literature in the success rate of service tasks. Additionally, we have demonstrated COWP using a mobile manipulator. Supplementary materials are available at: https://cowplanning.github.io/
A Causal-based Approach to Explain, Predict and Prevent Failures in Robotic Tasks
Diehl, Maximilian, Ramirez-Amaro, Karinne
Robots working in real environments need to adapt to unexpected changes to avoid failures. This is an open and complex challenge that requires robots to timely predict and identify the causes of failures to prevent them. In this paper, we present a causal method that will enable robots to predict when errors are likely to occur and prevent them from happening by executing a corrective action. First, we propose a causal-based method to detect the cause-effect relationships between task executions and their consequences by learning a causal Bayesian network (BN). The obtained model is transferred from simulated data to real scenarios to demonstrate the robustness and generalization of the obtained models. Based on the causal BN, the robot can predict if and why the executed action will succeed or not in its current state. Then, we introduce a novel method that finds the closest state alternatives through a contrastive Breadth-First-Search if the current action was predicted to fail. We evaluate our approach for the problem of stacking cubes in two cases; a) single stacks (stacking one cube) and; b) multiple stacks (stacking three cubes). In the single-stack case, our method was able to reduce the error rate by 97%. We also show that our approach can scale to capture multiple actions in one model, allowing to measure timely shifted action effects, such as the impact of an imprecise stack of the first cube on the stacking success of the third cube. For these complex situations, our model was able to prevent around 75% of the stacking errors, even for the challenging multiple-stack scenario. Thus, demonstrating that our method is able to explain, predict, and prevent execution failures, which even scales to complex scenarios that require an understanding of how the action history impacts future actions.
Piacentini
Compilation techniques in planning reformulate a problem into an alternative encoding for which efficient, off-the-shelf solvers are available. In this work, we present a novel mixed-integer linear programming (MILP) compilation for cost-optimal numeric planning with instantaneous actions. While recent works on the problem are restricted to actions that modify variables present in simple numeric conditions, our MILP formulation, in addition, handles linear conditions and linear action effects on numeric state variables. Such problems are particularly challenging due to the state-dependency of the action effects. Experiments show that our approach, in addition to being the state of the art for the more general problem class, is competitive with heuristic search-based planners on domains with only simple numeric conditions.
An Empirical Comparison of PDDL-based and ASP-based Task Planners
Jiang, Yuqian, Zhang, Shiqi, Khandelwal, Piyush, Stone, Peter
General purpose planners enable AI systems to solve many different types of planning problems. However, many different planners exist, each with different strengths and weaknesses, and there are no general rules for which planner would be best to apply to a given problem. In this paper, we empirically compare the performance of state-of-the-art planners that use either the Planning Domain Description Language (PDDL), or Answer Set Programming (ASP) as the underlying action language. PDDL is designed for automated planning, and PDDL-based planners are widely used for a variety of planning problems. ASP is designed for knowledge-intensive reasoning, but can also be used for solving planning problems. Given domain encodings that are as similar as possible, we find that PDDL-based planners perform better on problems with longer solutions, and ASP-based planners are better on tasks with a large number of objects or in which complex reasoning is required to reason about action preconditions and effects. The resulting analysis can inform selection among general purpose planning systems for a particular domain.
Decidable Verification of Golog Programs over Non-Local Effect Actions
Zarrieß, Benjamin (Technische Universität Dresden) | Claßen, Jens (RWTH Aachen University)
The Golog action programming language is a powerful means to express high-level behaviours in terms of programs over actions defined in a Situation Calculus theory. In particular for physical systems, verifying that the program satisfies certain desired temporal properties is often crucial, but undecidable in general, the latter being due to the language's high expressiveness in terms of first-order quantification, range of action effects, and program constructs. So far, approaches to achieve decidability involved restrictions where action effects either had to be context-free (i.e. not depend on the current state), local (i.e. only affect objects mentioned in the action's parameters), or at least bounded (i.e. only affect a finite number of objects). In this paper, we introduce two new, more general classes of action theories that allow for context-sensitive, non-local, unbounded effects, i.e. actions that may affect an unbounded number of possibly unnamed objects in a state-dependent fashion. We contribute to the further exploration of the boundary between decidability and undecidability for Golog, showing that for our new classes of action theories in the two-variable fragment of first-order logic, verification of CTL* properties of programs over ground actions is decidable.